AITopics

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.51)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.41)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.41)

Neural Information Processing SystemsDec-24-2025, 20:26:28 GMT

Early-Learning Regularization Prevents Memorization of Noisy Labels

We propose a novel framework to perform classification via deep learning in the presence of noisy annotations. When trained on noisy labels, deep neural networks have been observed to first fit the training data with clean labels during an early learning phase, before eventually memorizing the examples with false labels. We prove that early learning and memorization are fundamental phenomena in high-dimensional classification tasks, even in simple linear models, and give a theoretical explanation in this setting. Motivated by these findings, we develop a new technique for noisy classification tasks, which exploits the progress of the early learning phase. In contrast with existing approaches, which use the model output during early learning to detect the examples with clean labels, and either ignore or attempt to correct the false labels, we take a different route and instead capitalize on early learning via regularization. There are two key elements to our approach. First, we leverage semi-supervised learning techniques to produce target probabilities based on the model outputs. Second, we design a regularization term that steers the model towards these targets, implicitly preventing memorization of the false labels. The resulting framework is shown to provide robustness to noisy annotations on several standard benchmarks and real-world datasets, where it achieves results comparable to the state of the art.

early-learning regularization prevent memorization, false label, name change, (7 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.59)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.59)

Neural Information Processing SystemsAug-16-2025, 08:26:14 GMT

c5d736809766d46260d816d8dbc9eb44-Supplemental.pdf

generator, input signal, synthesized signal, (14 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.73)

Babaiee, Zahra, Kiasari, Peyman M., Rus, Daniela, Grosu, Radu

Visual Graph Arena: Evaluating Visual Conceptualization of Vision and Multimodal Large Language Models

arXiv.org Artificial IntelligenceJun-9-2025

Recent advancements in multimodal large language models have driven breakthroughs in visual question answering. Yet, a critical gap persists, `conceptualization'-the ability to recognize and reason about the same concept despite variations in visual form, a basic ability of human reasoning. To address this challenge, we introduce the Visual Graph Arena (VGA), a dataset featuring six graph-based tasks designed to evaluate and improve AI systems' capacity for visual abstraction. VGA uses diverse graph layouts (e.g., Kamada-Kawai vs. planar) to test reasoning independent of visual form. Experiments with state-of-the-art vision models and multimodal LLMs reveal a striking divide: humans achieved near-perfect accuracy across tasks, while models totally failed on isomorphism detection and showed limited success in path/cycle tasks. We further identify behavioral anomalies suggesting pseudo-intelligent pattern matching rather than genuine understanding. These findings underscore fundamental limitations in current AI models for visual understanding. By isolating the challenge of representation-invariant reasoning, the VGA provides a framework to drive progress toward human-like conceptualization in AI visual models. The Visual Graph Arena is available at: \href{https://vga.csail.mit.edu/}{vga.csail.mit.edu}

large language model, machine learning, natural language, (21 more...)

2506.06242

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.44)
Europe > Austria (0.28)

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsJan-13-2025, 20:18:11 GMT

Early-Learning Regularization Prevents Memorization of Noisy Labels

We propose a novel framework to perform classification via deep learning in the presence of noisy annotations. When trained on noisy labels, deep neural networks have been observed to first fit the training data with clean labels during an "early learning" phase, before eventually memorizing the examples with false labels. We prove that early learning and memorization are fundamental phenomena in high-dimensional classification tasks, even in simple linear models, and give a theoretical explanation in this setting. Motivated by these findings, we develop a new technique for noisy classification tasks, which exploits the progress of the early learning phase. In contrast with existing approaches, which use the model output during early learning to detect the examples with clean labels, and either ignore or attempt to correct the false labels, we take a different route and instead capitalize on early learning via regularization. There are two key elements to our approach.

early learning, early-learning regularization prevent memorization, false label, (5 more...)

Industry: Education > Educational Setting > Preschool (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.62)

arXiv.org Artificial IntelligenceJun-10-2024

A Multi-module Robust Method for Transient Stability Assessment against False Label Injection Cyberattacks

Wang, Hanxuan, Lu, Na, Liu, Yinhong, Wang, Zhuqing, Wang, Zixuan

The success of deep learning in transient stability assessment (TSA) heavily relies on high-quality training data. However, the label information in TSA datasets is vulnerable to contamination through false label injection (FLI) cyberattacks, resulting in degraded performance of deep TSA models. To address this challenge, a Multi-Module Robust TSA method (MMR) is proposed to rectify the supervised training process misguided by FLI in an unsupervised manner. In MMR, a supervised classification module and an unsupervised clustering module are alternatively trained to improve the clustering friendliness of representation leaning, thereby achieving accurate clustering assignments. Leveraging the clustering assignments, we construct a training label corrector to rectify the injected false labels and progressively enhance robustness and resilience against FLI. However, there is still a gap on accuracy and convergence speed between MMR and FLI-free deep TSA models. To narrow this gap, we further propose a human-in-the-loop training strategy, named MMR-HIL. In MMR-HIL, potential false samples can be detected by modeling the training loss with a Gaussian distribution. From these samples, the most likely false samples and most ambiguous samples are re-labeled by a TSA experts guided bi-directional annotator and then subjected to penalized optimization, aimed at improving accuracy and convergence speed. Extensive experiments indicate that MMR and MMR-HIL both exhibit powerful robustness against FLI in TSA performance. Moreover, the contaminated labels can also be effectively corrected, demonstrating superior resilience of the proposed methods.

assignment, false label, mmr -hil, (15 more...)

2406.06744

Country:

Asia > China > Shaanxi Province > Xi'an (0.04)
Asia > Singapore (0.04)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Energy > Power Industry (0.92)
Government > Military > Cyberwarfare (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Struski, Łukasz, Pardyl, Adam, Tabor, Jacek, Zieliński, Bartosz

ProPML: Probability Partial Multi-label Learning

arXiv.org Artificial IntelligenceMar-12-2024

Abstract--Partial Multi-label Learning (PML) is a type of weakly supervised learning where each training instance corresponds to a set of candidate labels, among which only some are true. ProPML outperforms existing approaches, especially for high noise in a candidate set. Pineapple Deep neural networks are highly effective in many practical applications. However, their success is heavily dependent on the availability of a large dataset with accurate labeling. Figure 1: In partial multiple-label learning, each training instance Obtaining such datasets is challenging due to the cost and corresponds to a set of candidate labels.

dataset, learning, propml, (14 more...)

2403.07603

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Poland (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

arXiv.org Artificial IntelligenceJul-31-2023

Deception Abilities Emerged in Large Language Models

Hagendorff, Thilo

Large language models (LLMs) are currently at the forefront of intertwining artificial intelligence (AI) systems with human communication and everyday life. Thus, aligning them with human values is of great importance. However, given the steady increase in reasoning abilities, future LLMs are under suspicion of becoming able to deceive human operators and utilizing this ability to bypass monitoring efforts. As a prerequisite to this, LLMs need to possess a conceptual understanding of deception strategies. This study reveals that such strategies emerged in state-of-the-art LLMs, such as GPT-4, but were non-existent in earlier LLMs. We conduct a series of experiments showing that state-of-the-art LLMs are able to understand and induce false beliefs in other agents, that their performance in complex deception scenarios can be amplified utilizing chain-of-thought reasoning, and that eliciting Machiavellianism in LLMs can alter their propensity to deceive. In sum, revealing hitherto unknown machine behavior in LLMs, our study contributes to the nascent field of machine psychology.

large language model, machine learning, natural language, (20 more...)

2307.16513

Country:

Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.04)
South America > Uruguay > Artigas > Artigas (0.04)
North America > United States > New York (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceJun-15-2023

Partial-Label Regression

Cheng, Xin, Wang, Deng-Bao, Feng, Lei, Zhang, Min-Ling, An, Bo

Partial-label learning is a popular weakly supervised learning setting that allows each training example to be annotated with a set of candidate labels. Previous studies on partial-label learning only focused on the classification setting where candidate labels are all discrete, which cannot handle continuous labels with real values. In this paper, we provide the first attempt to investigate partial-label regression, where each training example is annotated with a set of real-valued candidate labels. To solve this problem, we first propose a simple baseline method that takes the average loss incurred by candidate labels as the predictive loss. The drawback of this method lies in that the loss incurred by the true label may be overwhelmed by other false labels. To overcome this drawback, we propose an identification method that takes the least loss incurred by candidate labels as the predictive loss. We further improve it by proposing a progressive identification method to differentiate candidate labels using progressively updated weights for incurred losses. We prove that the latter two methods are model-consistent and provide convergence analyses. Our proposed methods are theoretically grounded and can be compatible with any models, optimizers, and losses. Experiments validate the effectiveness of our proposed methods.

artificial intelligence, inductive learning, machine learning, (18 more...)

2306.08968

Country:

Asia > China > Chongqing Province > Chongqing (0.04)
Asia > Singapore (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

Umer, Muhammad, Dawson, Glenn, Polikar, Robi

Targeted Forgetting and False Memory Formation in Continual Learners through Adversarial Backdoor Attacks

arXiv.org Artificial IntelligenceFeb-17-2020

Artificial neural networks are well-known to be susceptible to catastrophic forgetting when continually learning from sequences of tasks. Various continual (or "incremental") learning approaches have been proposed to avoid catastrophic forgetting, but they are typically adversary agnostic, i.e., they do not consider the possibility of a malicious attack. In this effort, we explore the vulnerability of Elastic Weight Consolidation (EWC), a popular continual learning algorithm for avoiding catastrophic forgetting. We show that an intelligent adversary can bypass the EWC's defenses, and instead cause gradual and deliberate forgetting by introducing small amounts of misinformation to the model during training. We demonstrate such an adversary's ability to assume control of the model via injection of "backdoor" attack samples on both permuted and split benchmark variants of the MNIST dataset. Importantly, once the model has learned the adversarial misinformation, the adversary can then control the amount of forgetting of any task. Equivalently, the malicious actor can create a "false memory" about any task by inserting carefully-designed backdoor samples to any fraction of the test instances of that task. Perhaps most damaging, we show this vulnerability to be very acute; neural network memory can be easily compromised with the addition of backdoor samples into as little as 1% of the training data of even a single task.

attacker, backdoor, training data, (17 more...)